For the dataset smashy_super the target is yval, which is a logloss performance measurement. Values close to 0 mean good performance. First, of all we want to know which parameter is important in general.
We need to load packages and subset the data to compare the whole dataset and the dataset with the 20% of configurations with the best outcome. In addition, the data must be manipulated to facilitate the use of the data for summaries and filters.
library(VisHyp)
library(mlr3)
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
smashy_super <- readRDS("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/package_VisHyp/data-raw/smashy_super.rds")
smashy_super <- as.data.frame(smashy_super)
n <- length(smashy_super)
for (i in 1:n) {
if(is.logical(smashy_super[,i]))
smashy_super[,i] <- as.factor(smashy_super[,i])
if(is.character(smashy_super[,i]))
smashy_super[,i] <- as.factor(smashy_super[,i])
}
superTask <- TaskRegr$new(id = "smashy_super", backend = smashy_super, target = "yval")
superBest <- smashy_super[smashy_super$yval >= quantile(smashy_super$yval, 0.8),]
superTaskBest <- TaskRegr$new(id = "taskBest", backend = superBest, target = "yval")
The target parameter yval can reach values between -0.3732 and -0.2105. Our goal is to obtain good results, i.e., to find configurations that produce values close to -0.2105.
The “random” samples perform better on average than the “bohb” samples. For the top 20% configurations, many “bohb” samples have been sorted out, but the remaining ones have on average a better performance than the “random” samples. In the end, both samples can lead to good performance values but since a lot of the remaining samples are “random” we will choose this value.
In general, for the parameter survival_fraction lower values perform better than higher values. Both subsets start with a low value and reach their maximum value directly afterwards. For the top configurations, higher values do not seem to be worse so that with good configurations of other paraemter the value of this parameter can be also high. Although not all high values have poor performance, lower values seem to be the right choice since most good configurations have lower values. A value between 0.05 and 0.30 seems to be a good choice for the “knn1” surrogate_learner.
The surrogate_learner parameter is one of the most important parameters for the whole dataset. After reducing the dataset to the best 20% of configurations, we could see that the parameter lost importance, since the best surrogate_learner were mainly “knn1”. Even though we found that for all other Surrogate_learner the best configuration could achieve a better yval than “knn1”, it makes sense to choose knn1 because of better results on average.
The most important parameter for the best 20% of the configurations was the random_interleave_fraction parameter. In this case, the results were unambiguous, so higher values led to better results for both the full data set and the subset. Another early indicator in the analysis was the summary of the full and split data sets. It could be seen that the summary indices for the subset were all higher. All effect tools such as the PDP, PCP, and Heatmap also showed these results. For our purpose, we only take values above 0.5, which is about half.
A similar problem occurs with budget_log_step. In the full dataset, higher values are better, but in the top 20% of configurations, lower values achieve better yval values. But unlike random_interleave_fraction, there are more configurations with good results in the split dataset. Also, it is a very important parameter for the top 20% configurations, so it should not be neglected that good performance values can be achieved with lower budget_log_step values. In this case it is better not to limit the parameter.
In the best parameter configurations in combination with “knn1” values of the surrogate_learner parameter, the filter_factor_first parameter was the most important parameter. In the full data set, this parameter was not important at all. There is also a difference in the range of good configurations. In the full dataset, values above 6 did not perform well, while in the subdivided dataset, values above 6 produced the best results. Even after subdividing into the best 20% of configurations, the majority of good values were above 4, so it can be said that values above 4 seem to be a good choice for this parameter.
A little more complicated was the interpretation of filter_factor_last. Filter_factor_last has large fluctuations and different good ranges depending on whether we look at the full or partial data set. Moreover, we can say that although the importance is high due to the large fluctuations, the range of predicted performances is not very large (which actually refutes the importance). In general, however, one can say that the parameter value for Filter_factor_last should be between 1.5 and 2.5, or above 5.5. Or at least not between 4 and 5.
A really good parameter to interpret is filter_with_max_budget. This parameter is not really important in the full dataset, but for the best configurations in combination with “knn1” one can say that “TRUE” should be the choice.
filter_algorithm, filter_select_per_tournament and random_interleave_random have barely an effect and therefore do not need to be limited.
To verify the proposed parameter configurations, we constrain the dataset and compare the obtained performance with the ranks of the performance of the whole dataset.
final <- smashy_super[smashy_super$sample == "random",]
final <- final[final$survival_fraction > 0.05 & final$survival_fraction < 0.3,]
final <- final[final$surrogate_learner == "knn1",]
final <- final[final$random_interleave_fraction > 0.5,]
final <- final[final$filter_factor_first > 4,]
final <- final[final$filter_factor_last < 4 | final$filter_factor_last > 5,]
final <- final[final$filter_with_max_budget == "TRUE",]
yval <- sort(final$yval, decreasing = TRUE)
yval_original <- sort(smashy_super$yval, decreasing = TRUE)
sort(match(yval, yval_original), decreasing = FALSE)
## [1] 20 40 49 58 62 69 79 107 112 115 116 130 152 161 162
## [16] 178 182 184 189 206 208 218 238 241 242 264 274 276 277 280
## [31] 295 296 300 305 318 319 331 332 336 340 356 377 378 382 388
## [46] 393 404 432 434 442 446 450 452 486 489 490 501 509 513 534
## [61] 539 547 550 568 578 602 604 605 621 626 632 637 661 664 682
## [76] 722 731 737 744 754 774 794 818 823 838 839 853 920 922 971
## [91] 987 996 1152 1292 1468 1470
We can see that many good results were obtained, but not nearly all of the best configurations were found out. This can be explained by the fact that we often imposed constraints to reduce the size of the data set. For example, for some categorical parameters, we always chose one factor even though we knew that other categories could also yield good values. Furthermore, numerical parameters were partly restricted, although it was known that for some very good configurations, very good yval values can also be obtained outside the range. In the end, however, we were able to show that the ranges we restricted lead to almost exclusively above-average or good performance values.
With the implemented PCP it can be visually checked. This can be checked visually with the implemented PCP. For a better overview, the color range is somewhat restricted, since there are very few observations below -0.3. For a better comparison, the presumed good range and the presumed worse configuration range of the parameters are shown once.
plotParallelCoordinate(superTask, labelangle = 10, colbarrange = c(-0.21, -0.3))
knitr::include_graphics("Super_Best_PCP.png")
knitr::include_graphics("Super_Bad_PCP.png")
An overview is obtained again.
head(smashy_super)
## budget_log_step survival_fraction surrogate_learner filter_with_max_budget
## 1 0.11449875 0.26100298 knn7 FALSE
## 2 -0.42921649 0.33760502 knn7 TRUE
## 3 0.04823162 0.01486055 knn7 TRUE
## 4 -1.44318828 0.57712483 knn7 TRUE
## 5 0.37983696 0.16755070 bohblrn FALSE
## 6 0.11449875 0.85519272 knn7 FALSE
## filter_factor_first random_interleave_fraction random_interleave_random
## 1 0.233780263 0.2254148 TRUE
## 2 3.756367542 0.1042924 TRUE
## 3 1.002387921 0.5424223 FALSE
## 4 6.404499751 0.6294822 TRUE
## 5 0.004248442 0.7319387 TRUE
## 6 5.105712766 0.6763331 TRUE
## sample filter_factor_last filter_algorithm filter_select_per_tournament
## 1 bohb 0.3870927 progressive 2.2749194
## 2 random 1.5890745 progressive 2.2996638
## 3 random 2.9274948 progressive 1.9313954
## 4 bohb 1.8534344 tournament 1.7707135
## 5 random 4.0016987 tournament 2.2842471
## 6 bohb 3.8174711 tournament 0.4610276
## yval
## 1 -0.2205114
## 2 -0.2158789
## 3 -0.2123531
## 4 -0.2121151
## 5 -0.2117795
## 6 -0.2186847
str(smashy_super)
## 'data.frame': 2845 obs. of 12 variables:
## $ budget_log_step : num 0.1145 -0.4292 0.0482 -1.4432 0.3798 ...
## $ survival_fraction : num 0.261 0.3376 0.0149 0.5771 0.1676 ...
## $ surrogate_learner : Factor w/ 4 levels "bohblrn","knn1",..: 3 3 3 3 1 3 3 3 4 4 ...
## $ filter_with_max_budget : Factor w/ 2 levels "FALSE","TRUE": 1 2 2 2 1 1 2 2 1 2 ...
## $ filter_factor_first : num 0.23378 3.75637 1.00239 6.4045 0.00425 ...
## $ random_interleave_fraction : num 0.225 0.104 0.542 0.629 0.732 ...
## $ random_interleave_random : Factor w/ 2 levels "FALSE","TRUE": 2 2 1 2 2 2 1 1 2 1 ...
## $ sample : Factor w/ 2 levels "bohb","random": 1 2 2 1 2 1 1 2 2 2 ...
## $ filter_factor_last : num 0.387 1.589 2.927 1.853 4.002 ...
## $ filter_algorithm : Factor w/ 2 levels "progressive",..: 1 1 1 2 2 2 1 1 2 2 ...
## $ filter_select_per_tournament: num 2.27 2.3 1.93 1.77 2.28 ...
## $ yval : num -0.221 -0.216 -0.212 -0.212 -0.212 ...
We want to look at the importance for the whole dataset (general case) and for the best configurations (top 20%).
plotImportance(task = superTask)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
plotImportance(task = superTaskBest)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
For the full data set, surrogate_learner is the most and sample the second most important hyperparameter. After filtering the dataset, both parameters lose much of their importance and have little effect, so random_interleave_fraction becomes the most important parameter. Parameters like filter_algorithm, random_interleave_random and filter_with_max_budget have no effect on the full dataset nor on the filtered dataset.
After we have subdivided the data, we also want to look for structural changes in the summary.
summary(smashy_super)
## budget_log_step survival_fraction surrogate_learner filter_with_max_budget
## Min. :-1.7509 Min. :0.0001849 bohblrn: 374 FALSE:1119
## 1st Qu.:-0.8770 1st Qu.:0.1864801 knn1 :1658 TRUE :1726
## Median :-0.0860 Median :0.3550278 knn7 : 478
## Mean :-0.2054 Mean :0.4194451 ranger : 335
## 3rd Qu.: 0.4727 3rd Qu.:0.6533882
## Max. : 1.0186 Max. :0.9999182
## filter_factor_first random_interleave_fraction random_interleave_random
## Min. :0.004248 Min. :0.000615 FALSE:1624
## 1st Qu.:2.454531 1st Qu.:0.308627 TRUE :1221
## Median :4.393864 Median :0.545574
## Mean :4.066960 Mean :0.536262
## 3rd Qu.:5.794467 3rd Qu.:0.774285
## Max. :6.906027 Max. :0.999015
## sample filter_factor_last filter_algorithm
## bohb :1226 Min. :0.004248 progressive: 909
## random:1619 1st Qu.:2.268931 tournament :1936
## Median :4.183293
## Mean :3.911979
## 3rd Qu.:5.670457
## Max. :6.906027
## filter_select_per_tournament yval
## Min. :0.0009299 Min. :-0.3732
## 1st Qu.:1.0000000 1st Qu.:-0.2390
## Median :1.0000000 Median :-0.2331
## Mean :1.0740216 Mean :-0.2347
## 3rd Qu.:1.0869452 3rd Qu.:-0.2278
## Max. :2.3956034 Max. :-0.2105
summary(superBest)
## budget_log_step survival_fraction surrogate_learner filter_with_max_budget
## Min. :-1.74596 Min. :0.000291 bohblrn: 2 FALSE:127
## 1st Qu.:-0.46235 1st Qu.:0.121852 knn1 :546 TRUE :442
## Median : 0.25398 Median :0.256286 knn7 : 19
## Mean : 0.04121 Mean :0.320271 ranger : 2
## 3rd Qu.: 0.61932 3rd Qu.:0.433896
## Max. : 1.01297 Max. :0.992048
## filter_factor_first random_interleave_fraction random_interleave_random
## Min. :0.004248 Min. :0.02443 FALSE:337
## 1st Qu.:3.697472 1st Qu.:0.43278 TRUE :232
## Median :5.308223 Median :0.63116
## Mean :4.710573 Mean :0.61323
## 3rd Qu.:6.174077 3rd Qu.:0.82455
## Max. :6.899001 Max. :0.98931
## sample filter_factor_last filter_algorithm
## bohb :156 Min. :0.1005 progressive:202
## random:413 1st Qu.:2.7705 tournament :367
## Median :4.8008
## Mean :4.3414
## 3rd Qu.:6.0197
## Max. :6.8990
## filter_select_per_tournament yval
## Min. :0.001125 Min. :-0.2270
## 1st Qu.:1.000000 1st Qu.:-0.2261
## Median :1.000000 Median :-0.2249
## Mean :1.055841 Mean :-0.2244
## 3rd Qu.:1.000000 3rd Qu.:-0.2234
## Max. :2.381424 Max. :-0.2105
These summary already explains why the parameter surrogate_learner lost most of its importance. Many bohblrn, knn7 and rangers were kicked out. This could mean that these learner perform worse on average than the knn1 learner. For the parameter filter_with_max_budget many configurations with FALSE were filtered out in disproportionate numbers. This could means that TRUE values perform better on average. It is also noted that the summary values of survival_fraction have decreased and increased for budget_log_step , Filter_factor_first and random_interleave_fraction. Finally, a disproportionate number of “bohb”samples also dropped out of the data set. Perhaps this is an indication that “ranom” samples gave better results.
The hyperparameter will be examined in following sections more precise.
As we could find out, “sample” is again an important parameter in the full dataset and can take the values “bohb” or “random”. This parameter should have the right value for good performance. Therefore, let us consider the effects of the parameter in a partial dependence plot. We also check if the effect applies to all parameters. We can use a heatmap to get a quick overview of interactions. Values close to 1 have barealy an effect on the outcome.
plotPartialDependence(superTask, features = c("sample"), rug = FALSE, plotICE = FALSE)
subplot(
plotHeatmap(superTask, features = c("sample", "budget_log_step"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "survival_fraction"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "surrogate_learner"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_with_max_budget"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_factor_first"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "random_interleave_fraction"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "random_interleave_random"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_factor_last"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_algorithm"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_select_per_tournament"), rug = FALSE),
nrows = 5,shareX = TRUE)
In the PDP, it can be seen that the target values for “random” samples lead to better results on average than for “bohb” samples. In the heatmaps, it can be seen that the predicted performances may be better when filter_with_max_budget is set to “TRUE”, random_interleave_fraction is given a high value and survival_fraction is given a low value. As suspected since the Summary, the surrogate_learner knn1 seems to give better results. This means that knn1 gives the best results on average.
we can split the data according to the best 20% yval values of the dataset and check if the outcome of a PDP is different.
plotPartialDependence(superTaskBest, features = c("sample"), rug = TRUE, plotICE = TRUE)
A lot of “bohb” samples were sorted out, but the remaining ones have on average a better performance than the “random” samples. Since both subsets seem important for further analysis, we split the entire dataset. Furthermore, we assume differences between “random” and “bohb” samples, since the parameter has lost much of its importance after filtering. Therefore we split the data set into “bohb” and “random” samples.
random <- smashy_super[smashy_super$sample == "random",]
bohb <- smashy_super[smashy_super$sample == "bohb",]
randomSubset <- TaskRegr$new(id = "task_random", backend = random, target = "yval")
bohbSubset <- TaskRegr$new(id = "task_bohb", backend = bohb, target = "yval")
Let’s check if there are differences in importance for the parameters in the random subset and the Bohb subset.
plotImportance(task = bohbSubset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
plotImportance(task = randomSubset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
The hyperparameter surrogate_learner and random_interleave_fraction are still the most important parameter for both partial datasets. In fact, the importance didn’t change a lot.
There is little difference between the two samples in the full data set. We did find that the majority of the good results were obtained with the “random” samples, but for further analysis we will look at both the “random” subset and the “bohb” subset.
The survival rate parameter was a moderately important parameter for both samples of the entire data set, but we assumed based on the summary that low values may lead to better performance. This parameter can take values between 0.00007 and 0.9998. Let us explore this assumption with a PDP.
plotPartialDependence(bohbSubset, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)
plotPartialDependence(randomSubset, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)
In general, lower values perform better than higher values. Both subsets start with a low value and reach their maximum value directly afterwards. This means that the value should probably be low, but not minimal. For both subsets, the best range seems to be between 0.05 and 0.25. While the “random” samples are almost monotonly decreasing the “bohb” samples has another height between 0.5 and 0.75.
A possibility to find analyze the structure is to filter the data again. For this we can split the data according to the best 20% yval values of the bohb samples. We can review “bohb” samples with ICE-Curves. ICE-Curvers can show the heterogeneous relationship between the parameter survival_fraction and the performance parameter yval created by interactions.
bohbBest <- bohb[bohb$yval >= quantile(bohb$yval, 0.8),]
bohbBestTask <- TaskRegr$new(id = "bohbBestTask", backend = bohbBest, target = "yval")
randomBest <- bohb[bohb$yval >= quantile(bohb$yval, 0.8),]
randomBestTask <- TaskRegr$new(id = "randomBestTask", backend = bohbBest, target = "yval")
plotPartialDependence(bohbBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = TRUE)
plotPartialDependence(randomBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = TRUE)
In this case, higher values do not seem to be worse. This is surprising, since in the general case low values were more important. It could mean that with good configurations of other parameters, the survival_fraction parameter even gives better results when a high value is chosen. This could also explain the increase in the range between 0.5 and 0.75 for the “bohb” sample. Looking at the rug, we see that most configurations were made below 0.5 and the fewest configurations were made above 0.75. Because of the few configurations with high values, the effect of good performances in this range is less strong. In the range between 0.5 and 0.75, there are more configurations, which therefore have a greater impact on the average curve. Although not all high values have poor performance, lower values seem to be the right choice since most good configurations have lower values.
A very important parameter for the bohb subset was the surrogate_learner. We can already assume that “knn1” is the most important surrogate_learner, since many other surrogate_learner were filtered out in the top 20% dataset. But let’s check this with a PDP.
plotPartialDependence(bohbSubset, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)
#### Subset bohb
plotPartialDependence(randomSubset, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)
In both subsets, knn1 is actually the best choice based on the PDP. There does not seem to be much difference in the other parameters. For a more detailed analysis, we should split the data into the individual surrogate learners and see if there are differences in the importance of the other parameters. Although it would be interesting to analyze the learners for both samples separately, we focus on the whole dataset to make it less complicated and because the importance of the subsets does not differ much.
knn1Surrogate <- smashy_super[smashy_super$surrogate_learner == "knn1",]
knn7Surrogate <- smashy_super[smashy_super$surrogate_learner == "knn7",]
bohblrnSurrogate <- smashy_super[smashy_super$surrogate_learner == "bohblrn",]
rangerSurrogate <- smashy_super[smashy_super$surrogate_learner == "ranger",]
knn1Subset <- TaskRegr$new(id = "knn1Task", backend = knn1Surrogate, target = "yval")
knn7Subset <- TaskRegr$new(id = "knn7task", backend = knn7Surrogate, target = "yval")
bohblrnSubset <- TaskRegr$new(id = "bohblrnTask", backend = bohblrnSurrogate, target = "yval")
rangerSubset <- TaskRegr$new(id = "rabgerTask", backend = rangerSurrogate, target = "yval")
plotImportance(knn1Subset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
plotImportance(knn7Subset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
plotImportance(bohblrnSubset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
plotImportance(rangerSubset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
The parameter sample, random_interleave_fraction are most important for “knn1”, “knn7” and “ranger.” For the “bohblrn” the parameter survival_fraction is more important than the parameter random_interleave_fraction. The parameter filter_with_max_budget has barely effect for all parameter but the knn1 learner. These are the parameters we should check more closely
Most important Parameter for nearly all surrogate_learner is “sample”.
plotPartialDependence(knn1Subset, "sample", rug = FALSE)
plotPartialDependence(knn7Subset, "sample", rug = FALSE)
plotPartialDependence(bohblrnSubset, "sample", rug = FALSE)
plotPartialDependence(rangerSubset, "sample", rug = FALSE)
We already knew that random is better on average but know we also know that this assumption is true for all surrogate_learner
plotPartialDependence(knn1Subset, "random_interleave_fraction", plotICE = FALSE)
plotPartialDependence(knn7Subset, "random_interleave_fraction", plotICE = FALSE)
plotPartialDependence(bohblrnSubset, "random_interleave_fraction", plotICE = FALSE)
plotPartialDependence(rangerSubset, "random_interleave_fraction", plotICE = FALSE)
For the parameter random_interleave_fraction higher values always seem to be better. For “knn1” and “knn7”, low random_interleave_fraction values seem to have a stronger negative impact on the prediction than a low value for “ranger” or “bohblrn”. For the surrogate_learner “knn1” and “bohblrn”, the maximum results in slightly worse predicted performance, but since there are few instances, this is not certain. Values between 0.75 and 0.95 can be considered optimal values for the parameter.
Another important parameter for all surrogate_learner is the survival_fraction parameter. Also, for the “bohblrn” the parameter survival_fraction was noticeably more important than for other learners. Thats why we look at this parameter next.
plotPartialDependence(knn1Subset, "survival_fraction")
plotPartialDependence(knn7Subset, "survival_fraction")
plotPartialDependence(bohblrnSubset, "survival_fraction")
plotPartialDependence(rangerSubset, "survival_fraction")
Low value for survival_fraction are better in general for the learners “knn1”, “knn7”. For knn1 a value close to 0 and for knn7 a value between 0.05 and 0.15 should be considered. For “bohblrn” values around 0.25 and 0.35 and for "ranger around 0.15 and 0.25 seems to produce best predicted performances.
The last parameter we want to check if filter_with_max_budget. It was only important for knn1 and not important for the other parameters.
plotPartialDependence(knn1Subset, "filter_with_max_budget")
plotPartialDependence(knn7Subset, "filter_with_max_budget")
plotPartialDependence(bohblrnSubset, "filter_with_max_budget")
plotPartialDependence(rangerSubset, "filter_with_max_budget")
When we compared the importance of surrogate_learner, we found that the filter_with_max_budget parameter was only important for “knn1”. Here we can see that for “knn1” the parameter filter_with_max_budget should be set to “TRUE”. For other parameters it is indeed not important if the parameter is set to “TRUE” or “FALSE”.
when we compared the summary of the full dataset with the top 20% configurations we could see that both, random and bohb samples were left. We also could see that mostly knn1 learner were left. To see if it is still possible to gain good results with these learner lets have a look on max values for all the learners.
summary(superBest$surrogate_learner)
## bohblrn knn1 knn7 ranger
## 2 546 19 2
aggregate(x = superBest$yval,
by = list(superBest$surrogate_learner),
FUN = max)
## Group.1 x
## 1 bohblrn -0.2117795
## 2 knn1 -0.2170470
## 3 knn7 -0.2105208
## 4 ranger -0.2124898
It is interesting to see that the best configuration of each learner, filtered out in large numbers, achieve a better yval than for the “knn1” learner. This is important because with this finding we know that it is indeed possible to achieve good results with all learners and not only with “knn1.” But “knn1” achieves the best results on average, which means that this learner is more robust and changes in configuration compared to the other learners do not have such a large negative impact on performance.
We also want to investige the best cases and for this directly check the subdivided datasets.
Lets investigate knn1 a bit more. Because we have less data, we also can also make use of a Parallel Coordiante Plot.
knn1Best <- bohbBest[bohbBest$surrogate_learner == "knn1",]
knn1BestTask <- TaskRegr$new(id = "task", backend = knn1Best, target = "yval")
plotParallelCoordinate(knn1BestTask, labelangle = 10)
plotImportance(knn1BestTask)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
In the PCP it can be seen that the parameter filter_with_max_budget should set to “TRUE”, random_interleave_random to “FALSE” and random_interleave_fraction should be high for good results.
Accordint Importance Plot The paramter filter_factor_first and filter_factor_last. are very important as well and should be further examined.
plotPartialDependence(knn1BestTask, "filter_factor_first" )
plotPartialDependence(knn1BestTask, "filter_factor_last")
In the PDP we can see that filter_factor_first should be high and fitler_factor_last has best outcome for values beteen 1.5 and 2.5 or above 6
Another very important parameter for random Subsets and for the filtered dataset is the budget_log_step parameter. First, let us investigate the parameter with a PDP for the full dataset.
plotPartialDependence(bohbSubset, features = c("budget_log_step"), rug = FALSE, plotICE = FALSE)
plotPartialDependence(randomSubset, features = c("budget_log_step"), rug = FALSE, plotICE = FALSE)
For the random Subset higher values produces better outcomes. For the bohbSubset there are two peaks around -0.5 and 0.5. To find reasons for the two peaks lets focus on the top 20% again.
plotPartialDependence(bohbBestTask, features = c("budget_log_step"), rug = TRUE, plotICE = TRUE)
plotPartialDependence(randomBestTask, features = c("budget_log_step"), rug = TRUE, plotICE = TRUE)
Similar to the survival_fraction parameter, configurations with a low value seem to have a positive rather than negative effect on performance if the other parameters are set correctly. This could be the reason why there are two weapks for the “bohb” sample.
If we look on low values only we can see that the predicted performance varies a lot and that other parameter configurations are responsible. We choose budget_log_step values under -1.4 to get less than 150 configurations.
budgetSubset <- random[random$budget_log_step < -1.4,]
budgetSubsetTask <- TaskRegr$new(id = "bohbBestTask", backend = budgetSubset, target = "yval")
plotParallelCoordinate(budgetSubsetTask, labelangle = 10)
In the PCP we can see that good values are often obtained with a “knn1” learner. A low survival_fraction is also important. The random_interleave_fraction parameter should be high instead.
Another possibiliy is to look on a two dimensional partial dependence plot. We compare budget_log_step with the 2 parameter we found in the PCP.
plotPartialDependence(randomSubset, features = c("budget_log_step", "survival_fraction"), rug = FALSE, gridsize = 10)
plotPartialDependence(randomSubset, features = c("budget_log_step", "random_interleave_fraction"), rug = FALSE, gridsize = 10)
We can see that high values have less poor performance when other parameters are also poorly configured. Conversely, it is also possible to achieve good values when budget_log_step is low and the other parameters are well configured.
Random_interleave_fraction can vary between 0 and 1. This parameter had a high performance in both subsets and was also the most important parameter for the 20% best configurations. Therefore it is really useful to check this parameter.
plotPartialDependence(bohbSubset, features = c("random_interleave_fraction"), rug = FALSE, plotICE = FALSE)
plotPartialDependence(randomSubset, features = c("random_interleave_fraction"), rug = FALSE, plotICE = FALSE)
A good choice for the parameter configuration for random_interleave_fraction of the “bohb” samples is a high value. A good range seems to be between 0.75 and 0.95, For the random samples a high value between 0.5 and 0.75 seems to produce best performances.
plotPartialDependence(bohbBestTask, features = c("random_interleave_fraction"), rug = FALSE, gridsize = 20)
plotPartialDependence(randomBestTask, features = c("random_interleave_fraction"), rug = FALSE, gridsize = 20)
The filtered dataset shows that low values doesn’t have such a bad negative impact on the outcome but high values are better. A value should be chosen over 0.5
The parameter filter_factor_last was just medicore important but a little check is good as well.
plotPartialDependence(bohbSubset, "filter_factor_last", plotICE = FALSE, gridsize = 40)
plotPartialDependence(bohbBestTask, features = c("filter_factor_last"), rug = TRUE, plotICE = FALSE, gridsize = 40)
plotPartialDependence(randomSubset, "filter_factor_last", plotICE = FALSE, gridsize = 40)
plotPartialDependence(randomBestTask, features = c("filter_factor_last"), rug = TRUE, plotICE = FALSE, gridsize = 40)
Filter_factor_last has much fluctuation and therefore we choose a higher gridsize. When the fluctuations raise the importance raises as well even the range of predicted performances is not really big. the parameter value for Filter_factor_last should be between 1.5 and 2.5 or For bohb samples over 5.5 and for random samples between 5 and 5.5.
plotPartialDependence(bohbSubset, "filter_with_max_budget", rug = FALSE)
plotPartialDependence(bohbBestTask, features = c("filter_with_max_budget"), rug = FALSE)
plotPartialDependence(randomSubset, "filter_with_max_budget", rug = FALSE)
plotPartialDependence(randomSubset, features = c("filter_with_max_budget"), rug = FALSE)
The parameter filter_with_max_budget has a weak effect but should be set to “TRUE”.
This parameter had barely an effect on the general case but got a little more important in the top 20% configurations. We check the partial dependence and the dependencies with the most important parameters to get more insight.
plotPartialDependence(bohbSubset, features = c("filter_select_per_tournament"), rug = FALSE, plotICE = FALSE)
plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament"), rug = FALSE, plotICE = FALSE)
plotPartialDependence(randomSubset, features = c("filter_select_per_tournament"), rug = FALSE, plotICE = FALSE)
plotPartialDependence(randomBestTask, features = c("filter_select_per_tournament"), rug = FALSE, plotICE = FALSE)
The effect is weak and maybe comes from the peaks around 1 - 1.3. The parameter should be probably choosen between 1 or slightly better but the effect shouldn’t effect much.
This parameter had barely an effect on the general case but got a little more important in the top 20% configurations. We check the partial dependence and the dependencies with the most important parameters to get more insight.
plotPartialDependence(bohbSubset, features = c("filter_factor_first"), rug = FALSE, plotICE = FALSE)
plotPartialDependence(bohbBestTask, features = c("filter_factor_first"), rug = TRUE, plotICE = FALSE)
plotPartialDependence(randomSubset, features = c("filter_factor_first"), rug = FALSE, plotICE = FALSE)
plotPartialDependence(randomBestTask, features = c("filter_factor_first"), rug = TRUE, plotICE = FALSE)
The parameter filter_factor_first shows interesting differences between the general and the subdivided case. While in the general cases values above 6 are decreasing a lot in the subset these values show best performances. Since in the subset the majority of good cases are in this area it seems to be a good choice to pick a value over 6